Autonomous Discovery of Reliable Exception Rules
نویسنده
چکیده
This paper presents an autonomous algorithm for discovering exception rules from data sets. An exception rule, which is defined as a deviational pattern to a well-known fact, exhibits unexpectedness and is sometimes extremely useful in spite of its obscurity. Previous discovery approaches for this type of knowledge have neglected the problem of evalnating the reliability of the rules extracted from a data set. It is clear, however, that this question is mandatory in distingnishing knowiedge from nnreliabie patterns witilout annoying the users. In order to circumvent these difficulties we propose a probabilistic estimation approach in which we obtain an exception rule associated with a common sense rule in the form of a rule pair. Onr approach discovers, based on the normal approximations of the multinomial distributions, rule pairs which satisfy, with high confidence, all the specified conditions. The time efficiency of the discovery process is improved by the newly-derived stopping criteria. PEDRE, which is a data mining system based on onr approach, has been validated nsing the benchmark data sets in the machine learning commnnity. Introduction In data mining, an association rule (Agrawal et nl. 199G), which is a statement of a regularity in the form of a production rule, represents one of the most importent PlaOED.2 rBf t1,a A;.,-*Tro,.nrl lr.,*~~rln~ma rlr,n tn itn cpclm“‘“AL” ~LL”““~” “A “I&X, UAllx,\,. C&b\, ‘L+L,“.” 1UUbb IIUb “Y I”., 6”ALr erality. An association rule can be classified into t,wo categories: a commou seuse rule, which is a descript,ion of a regularity for numerous objects, and an exception rule, which represents, for a relatively small number of objects, a different, regularity from a common sense rule (Suzuki Sr. Shimura 1996) (Suzuki 1996). An cxception rule exhibit,s unexpectedness and is often useful. For instance, the rule “using a seat belt is risky for a child”, which represents exceptions to the well known fact “using a seat belt is safe”, exhibited unexpectedness when it was discovered from car accident data several years ago, and is still useful. Moreover, an exception rule is often beneficial since it differs from a “Copyright @ 1997, American Association for Artificommon sense rule which is often a basis for people’s daily activity. For instance, suppose a species of poisonous mushrooms some of which are exceptionally edf ,-,n-n_ _...-_ L J-T-~T-L’--. -CL,.. iu~e. one exdc~ uesciip~ion 01 6118 exceptions is higiliy beneficial since it enables the exclusive possession of the edible mushrooms. Since an exception rule holds for a relatively small number of examples, the distinction of a reliable rule from a coincidental pattern is one of the most important issues in discovering this type of knowledge. However, such distinction was left to the users in the previous discovery systems (Piatetsky-Shapiro 8~ Matheus 1994) (KlSsgen 199G) (S uzuki & Shimura 199G) (Suzuki 1996). The evaluation of confidence by the users, depending on their subjective judgement, is uureliable and uncertain in case the discovered rules are numerous. In order to circumvent these difficulties we propose a novel approach in which exception rules are discovered according to their confidence level based on the normal approximations of the multinomial distributsions. This approach can be called as autonomous, since an exception rule is discovered using neither users’ confidence evaluation nor domain knowledge. Description of the Problem Let a data set contains n, examples each of which expressed by m discrete attributes. An event, representincr in nmnnQit;nnal fmn1 2 fina. vsl~10 o~o~47r~n~a~~t w-0, aa* y~Vy""~"~V~~.Y* -"I&A', <" ""LbLV .C"ICLb WUW~~l"".A." to an attribute will be called au atom. We define an association rule as the production rule of which the premise is represented by a conjunction of atoms and the conclusion is a single atom. In this paper, we consider the problem of finding a set of rule pairs each of which consists of an exception rule associated with a common sense ruie. A rule pair r(lb, vj is cl&led as a pair of association rules as follows: 4 s al An2 A*-. flap, (2) Bv E bl Ab2 A--Ab,. (3) In learning from examples, simplicity and goodnessc&fit 9l.a Pm-.o:rlo..orl "r‘l" ‘&Lb L."IIOIU~l~ll as Cl,0 .nno+ mn.rn....l “..:+,A n f,... IJllV III”Db ~%xlG’“’ Lllb~LKb I”A Suzuki 259 From: KDD-97 Proceedings. Copyright © 1997, AAAI (www.aaai.org). All rights reserved. evaluating the goodness of a hypothesis (Smyth & Goodman 1992). In case of an association rule A,, + C, these two criteria correspond to p(A,) and p(clA,,) respectively (Smyth & Goodman 1992). Existing methods for evaluating the simplicity and the goodness-offit of an association rule can be classified into two approaches: the single expression approach such as (KlGsgen 199G) (Smyth SC Goodman 1992), which assmnes a single criterion defined by a combination of the two criteria, and the simultaneous approach such as (Mannila et al. 1994). which specifies two minimum thresholds for both criteria. We take the latter approach due to its generality, and specify thresholds for the simplicity and the goodness-of-fit of both rules. Here? in order to consider the confidence level of the rules, we do not employ the probabilities @(A,), fi(clA,), #(A,,, B,) and #(c’lA,, B,) obtained by the point estimation from the data set. We obtain rule pairs of which their true probabilities p(A,,), p(clA,), p(A,, B,) andp(c’lAp, B,) are greater than or equal to their respective thresholds with a ,...r\hnh;l:trr ,A 1 _ 6, ~L”UuIUIII”J “A I P&0$,) 2 $1 2 1 6, (4) Pr{p(clA,) 2 6’:} 2 1 6, (5) Pr{p(A,,,&j 2 fl,“> 2 1 6, ((9 Pr{p(c’IAIL, B,,) > O,“} 2 1 6. (7) Consider the case in which the conditional probability p(c’lB,) is large. In such a case, the exception rule A, A B, + c’ can be easily guessed from B, + c’, which we call the reference rule, and is not considered as unexpected. Therefore, we add the following condition to obtain truly unexpected exception rules. Pr{p(c’IB,,) < O;} 2 1 6. (8) lA.r.., tl-s, ,h,vm A:nnrrno;rmo thn ntrnhlnm rl,aalt in tl,;c I l”lll l,l,G .x”“YI, IIIOb,IIb>DI”,,D) ,‘lLL ~II”I,1~ll‘ Ut,‘“I” 111 11‘1111 paper can be described as discovering, from a data set, the rule pairs T(/L, V) which satisfy (4) N (8). Evaluation of Reliability In data mining, the Chernoff bound and the normal approximations of the binomial distributions are frequently used in assessing the reliability of a discovered association rule (Agrawal et nl. 199G) (Ghan Sr: Wong 1991) (Manniia et nl. 1994) (Siebes 1994j. However, these methods are for estimating a single probability, and cannot be applied to our problem since (5), (7) and (8) contain conditional probabilities. This shows that we should estimate the confidence region of the probabilities related to (4) N (8). First, atoms Dl,Da,*.*, and DS are defined as follows:
منابع مشابه
A Q-learning Based Continuous Tuning of Fuzzy Wall Tracking
A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...
متن کاملTowards Reliable Autonomous Agents
We are interested in producing reliable autonomous robots that can operate for extended periods of time in uncertain, dynamic environments. We have been developing methodologies and software tools to facilitate this, including the Task Control Architecture and probabilistic methods for representing and reasoning about uncertainty. The aim is to incrementally produce reliable behavior by adding ...
متن کاملMining Financial Data with Scheduled Discovery of Exception Rules
This paper shows preliminary results, on financial data, of an algorithm for discovering pairs of an exception rule and a common sense rule under a prespecified schedule. An exception rule, which represents a regularity of exceptions to a common sense rule, often exhibits interestingness. Discovery of pairs of an exception rule and a common sense rule under threshold scheduling has been success...
متن کاملUnified Algorithm for Undirected Discovery of Execption Rules
This paper presents an algorithm that seeks every possible exception rule which violates a common sense rule and satisfies several assumptions of simplicity. Exception rules, which represent systematic deviation from common sense rules, are often found interesting. Discovery of pairs that consist of a common sense rule and an exception rule, resulting from undirected search for unexpected excep...
متن کاملData Mining Methods for Discovering Interesting Exceptions from an Unsupervised Table
In this paper, we survey efforts devoted to discovering interesting exceptions from data in data mining. An exception differs from the rest of data and thus is interesting and can be a clue for further discoveries. We classify methods into exception instance discovery, exception rule discovery, and exception structured-rules discovery and give a condensed and comprehensive introduction.
متن کامل